21 research outputs found

    Agrupamiento, predicción y clasificación ordinal para series temporales utilizando técnicas de machine learning: aplicaciones

    Get PDF
    In the last years, there has been an increase in the number of fields improving their standard processes by using machine learning (ML) techniques. The main reason for this is that the vast amount of data generated by these processes is difficult to be processed by humans. Therefore, the development of automatic methods to process and extract relevant information from these data processes is of great necessity, giving that these approaches could lead to an increase in the economic benefit of enterprises or to a reduction in the workload of some current employments. Concretely, in this Thesis, ML approaches are applied to problems concerning time series data. Time series is a special kind of data in which data points are collected chronologically. Time series are present in a wide variety of fields, such as atmospheric events or engineering applications. Besides, according to the main objective to be satisfied, there are different tasks in the literature applied to time series. Some of them are those on which this Thesis is mainly focused: clustering, classification, prediction and, in general, analysis. Generally, the amount of data to be processed is huge, arising the need of methods able to reduce the dimensionality of time series without decreasing the amount of information. In this sense, the application of time series segmentation procedures dividing the time series into different subsequences is a good option, given that each segment defines a specific behaviour. Once the different segments are obtained, the use of statistical features to characterise them is an excellent way to maximise the information of the time series and simultaneously reducing considerably their dimensionality. In the case of time series clustering, the objective is to find groups of similar time series with the idea of discovering interesting patterns in time series datasets. In this Thesis, we have developed a novel time series clustering technique. The aim of this proposal is twofold: to reduce as much as possible the dimensionality and to develop a time series clustering approach able to outperform current state-of-the-art techniques. In this sense, for the first objective, the time series are segmented in order to divide the them identifying different behaviours. Then, these segments are projected into a vector of statistical features aiming to reduce the dimensionality of the time series. Once this preprocessing step is done, the clustering of the time series is carried out, with a significantly lower computational load. This novel approach has been tested on all the time series datasets available in the University of East Anglia and University of California Riverside (UEA/UCR) time series classification (TSC) repository. Regarding time series classification, two main paths could be differentiated: firstly, nominal TSC, which is a well-known field involving a wide variety of proposals and transformations applied to time series. Concretely, one of the most popular transformation is the shapelet transform (ST), which has been widely used in this field. The original method extracts shapelets from the original time series and uses them for classification purposes. Nevertheless, the full enumeration of all possible shapelets is very time consuming. Therefore, in this Thesis, we have developed a hybrid method that starts with the best shapelets extracted by using the original approach with a time constraint and then tunes these shapelets by using a convolutional neural network (CNN) model. Secondly, time series ordinal classification (TSOC) is an unexplored field beginning with this Thesis. In this way, we have adapted the original ST to the ordinal classification (OC) paradigm by proposing several shapelet quality measures taking advantage of the ordinal information of the time series. This methodology leads to better results than the state-of-the-art TSC techniques for those ordinal time series datasets. All these proposals have been tested on all the time series datasets available in the UEA/UCR TSC repository. With respect to time series prediction, it is based on estimating the next value or values of the time series by considering the previous ones. In this Thesis, several different approaches have been considered depending on the problem to be solved. Firstly, the prediction of low-visibility events produced by fog conditions is carried out by means of hybrid autoregressive models (ARs) combining fixed-size and dynamic windows, adapting itself to the dynamics of the time series. Secondly, the prediction of convective cloud formation (which is a highly imbalance problem given that the number of convective cloud events is much lower than that of non-convective situations) is performed in two completely different ways: 1) tackling the problem as a multi-objective classification task by the use of multi-objective evolutionary artificial neural networks (MOEANNs), in which the two conflictive objectives are accuracy of the minority class and the global accuracy, and 2) tackling the problem from the OC point of view, in which, in order to reduce the imbalance degree, an oversampling approach is proposed along with the use of OC techniques. Thirdly, the prediction of solar radiation is carried out by means of evolutionary artificial neural networks (EANNs) with different combinations of basis functions in the hidden and output layers. Finally, the last challenging problem is the prediction of energy flux from waves and tides. For this, a multitask EANN has been proposed aiming to predict the energy flux at several prediction time horizons (from 6h to 48h). All these proposals and techniques have been corroborated and discussed according to physical and atmospheric models. The work developed in this Thesis is supported by 11 JCR-indexed papers in international journals (7 Q1, 3 Q2, 1 Q3), 11 papers in international conferences, and 4 papers in national conferences

    Time series ordinal classification via shapelets

    Get PDF
    Nominal time series classification has been widely developed over the last years. However, to the best of our knowledge, ordinal classification of time series is an unexplored field, and this paper proposes a first approach in the context of the shapelet transform (ST). For those time series dataset where there is a natural order between the labels and the number of classes is higher than 2, nominal classifiers are not capable of achieving the best results, because the models impose the same cost of misclassification to all the errors, regardless the difference between the predicted and the ground-truth. In this sense, we consider four different evaluation metrics to do so, three of them of an ordinal nature. The first one is the widely known Information Gain (IG), proved to be very competitive for ST methods, whereas the remaining three measures try to boost the order information by refining the quality measure. These three measures are a reformulation of the Fisher score, the Spearman's correlation coefficient (ρ), and finally, the Pearson's correlation coefficient (R 2 ). An empirical evaluation is carried out, considering 7 ordinal datasets from the UEA & UCR time series classification repository, 4 classifiers (2 of them of nominal nature, whereas the other 2 are of ordinal nature) and 2 performance measures (correct classification rate, CCR, and average mean absolute error, AMAE). The results show that, for both performance metrics, the ST quality metric based on R 2 is able to obtain the best results, specially for AMAE, for which the differences are statistically significant in favour of R 2

    An Evolutionary Artificial Neural Network approach for spatio-temporal wave height time series reconstruction

    Get PDF
    This paper proposes a novel methodology for recovering missing time series data, a crucial task for subsequent Machine Learning (ML) analyses. The methodology is specifically applied to Significant Wave Height (SWH) time series in the field of marine engineering. The proposed approach involves two phases. Firstly, the SWH time series for each buoy is independently reconstructed using three transfer function models: regression-based, correlation-based, and distance-based. The distance-based transfer function exhibits the best overall performance. Secondly, Evolutionary Artificial Neural Networks (EANNs) are utilised for the final recovery of each time series, using as inputs highly correlated buoys that have been intermediately recovered. The EANNs are evolved considering two metrics, the novel squared error relevance area, which balances the importance of extreme and around-mean values, and the well-known mean squared error. The study considers SWH time series data from 15 buoys in two coastal zones in the United States. The results demonstrate that the distance-based transfer function is generally the best transfer function, and that EANNs outperform a range of state-of-the-art ML techniques in 12 out of the 15 buoys, with a number of connections comparable to linear models. Furthermore, the proposed methodology outperforms the two most popular approaches for time series reconstruction, BRITS and SAITS, for all buoys except one. Therefore, the proposed methodology provides a promising approach, which may be applied to time series from other fields, such as wind or solar energy farms in the field of green energy

    Unsupervised Feature Based Algorithms for Time Series Extrinsic Regression

    Full text link
    Time Series Extrinsic Regression (TSER) involves using a set of training time series to form a predictive model of a continuous response variable that is not directly related to the regressor series. The TSER archive for comparing algorithms was released in 2022 with 19 problems. We increase the size of this archive to 63 problems and reproduce the previous comparison of baseline algorithms. We then extend the comparison to include a wider range of standard regressors and the latest versions of TSER models used in the previous study. We show that none of the previously evaluated regressors can outperform a regression adaptation of a standard classifier, rotation forest. We introduce two new TSER algorithms developed from related work in time series classification. FreshPRINCE is a pipeline estimator consisting of a transform into a wide range of summary features followed by a rotation forest regressor. DrCIF is a tree ensemble that creates features from summary statistics over random intervals. Our study demonstrates that both algorithms, along with InceptionTime, exhibit significantly better performance compared to the other 18 regressors tested. More importantly, these two proposals (DrCIF and FreshPRINCE) models are the only ones that significantly outperform the standard rotation forest regressor.Comment: 19 pages, 21 figures, 6 tables. Appendix include

    Convolutional and Deep Learning based techniques for Time Series Ordinal Classification

    Full text link
    Time Series Classification (TSC) covers the supervised learning problem where input data is provided in the form of series of values observed through repeated measurements over time, and whose objective is to predict the category to which they belong. When the class values are ordinal, classifiers that take this into account can perform better than nominal classifiers. Time Series Ordinal Classification (TSOC) is the field covering this gap, yet unexplored in the literature. There are a wide range of time series problems showing an ordered label structure, and TSC techniques that ignore the order relationship discard useful information. Hence, this paper presents a first benchmarking of TSOC methodologies, exploiting the ordering of the target labels to boost the performance of current TSC state-of-the-art. Both convolutional- and deep learning-based methodologies (among the best performing alternatives for nominal TSC) are adapted for TSOC. For the experiments, a selection of 18 ordinal problems from two well-known archives has been made. In this way, this paper contributes to the establishment of the state-of-the-art in TSOC. The results obtained by ordinal versions are found to be significantly better than current nominal TSC techniques in terms of ordinal performance metrics, outlining the importance of considering the ordering of the labels when dealing with this kind of problems.Comment: 13 pages, 9 figures, 3 table

    A hybrid approach to classification with shapelets

    Get PDF
    Shapelets are phase independent subseries that can be used to discriminate between time series. Shapelets have proved to be very effective primitives for time series classification. The two most prominent shapelet based classification algorithms are the shapelet transform (ST) and learned shapelets (LS). One significant difference between these approaches is that ST is data driven, whereas LS searches the entire shapelet space through stochastic gradient descent. The weakness of the former is that full enumeration of possible shapelets is very time consuming. The problem with the latter is that it is very dependent on the initialisation of the shapelets. We propose hybridising the two approaches through a pipeline that includes a time constrained data driven shapelet search which is then passed to a neural network architecture of learned shapelets for tuning. The tuned shapelets are extracted and formed into a transform, which is then classified with a rotation forest. We show that this hybrid approach is significantly better than either approach in isolation, and that the resulting classifier is not significantly worse than a full shapelet search

    Using machine learning methods to determine a typology of patients with HIV-HCV infection to be treated with antivirals

    Get PDF
    Several European countries have established criteria for prioritising initiation of treatment in patients infected with the hepatitis C virus (HCV) by grouping patients according to clinical characteristics. Based on neural network techniques, our objective was to identify those factors for HIV/HCV co-infected patients (to which clinicians have given careful consideration before treatment uptake) that have not being included among the prioritisation criteria. This study was based on the Spanish HERACLES cohort (NCT02511496) (April-September 2015, 2940 patients) and involved application of different neural network models with different basis functions (product-unit, sigmoid unit and radial basis function neural networks) for automatic classification of patients for treatment. An evolutionary algorithm was used to determine the architecture and estimate the coefficients of the model. This machine learning methodology found that radial basis neural networks provided a very simple model in terms of the number of patient characteristics to be considered by the classifier (in this case, six), returning a good overall classification accuracy of 0.767 and a minimum sensitivity (for the classification of the minority class, untreated patients) of 0.550. Finally, the area under the ROC curve was 0.802, which proved to be exceptional. The parsimony of the model makes it especially attractive, using just eight connections. The independent variable "recent PWID" is compulsory due to its importance. The simplicity of the model means that it is possible to analyse the relationship between patient characteristics and the probability of belonging to the treated group

    Gamifying the Classroom for the Acquisition of Skills Associated with Machine Learning: A Two-Year Case Study

    Get PDF
    Machine learning (ML) is the field of science that combines knowledge from artificial intelligence, statistics and mathematics intending to give computers the ability to learn from data without being explicitly programmed to do so. It falls under the umbrella of Data Science and is usually developed by Computer Engineers becoming what is known as Data Scientists. Developing the necessary competences in this field is not a trivial task, and applying innovative methodologies such as gamification can smooth the initial learning curve. In this context, communities offering platforms for open competitions such as Kaggle can be used as a motivating element. The main objective of this work is to gamify the classroom with the idea of providing students with valuable hands-on experience by means of addressing a real problem, as well as the possibility to cooperate and compete simultaneously to acquire ML competences. The innovative teaching experience carried out during two years meant a great motivation, an improvement of the learning capacity and a continuous recycling of knowledge to which Computer Engineers are faced to

    Hackathon in teaching: Machine Learning applied to Life Sciences

    Get PDF
    La programación ha sido tradicionalmente una competencia perteneciente a las ingenierías, que recientemente está adquiriendo una importancia significativa en áreas como Ciencias de la Vida, donde resulta fundamental para la resolución de problemas de análisis de datos. Este trabajo es un caso de estudio enmarcado en la necesidad de mejorar las habilidades, sobre análisis de datos en el alumnado de Ciencias de la Vida y de la base temática en los estudiantes de ingeniería. Mediante la herramienta del hackathon y el trabajo en equipo, se combinó al alumnado de ambas disciplinas y se le enfrentó a una serie de problemas de análisis de datos. Se establecieron equipos de trabajo que recibieron una formación previa al comienzo de la competición. De cada equipo, se valoró la metodología empleada para la obtención de los datos, su análisis, interpretación de resultados, y exposición de las diversas tareas. Se hizo un análisis descriptivo de los resultados del Proyecto mediante encuestas al alumnado, así como su percepción sobre las actividades realizadas. El Proyecto ha conseguido que el alumnado resuelva los problemas planteados, difícilmente abordables con equipos unidisciplinares, generando un aprendizaje común y una experiencia multidisciplinar altamente satisfactoria tanto para el alumnado como para el profesorado.Programming has traditionally been an engineering competence, but recently it is acquiring significant importance in several areas, such as Life Sciences, which is considered essential for problem-solving based on data analysis. This work is a case study framed within the need to improve not only the data analysis skills of life science students, but also the biological background concerning the given issue of engineering students. Using hackathon and teamwork-based tools, students from both disciplines have been made and challenged with a series of problems in the area of Life Sciences. To solve these problems, we established work teams trained before the competition's beginning. Their results were assessed concerning the approach to obtain the data, perform the analysis, and finally interpret and present the results to solve the challenges. The project outcomes were assessed using structured surveys for students and their overall perception. The project succeeded, meaning students solved the proposed problems and achieved the activity's goals. These goals would have been difficult to address with teams composed of students from the same field of study. The hackathon succeeded in generating a shared learning and a multidisciplinary experience for their professional training, being highly rewarding for both students and faculty members
    corecore